Skip to content

Conversation

@mostlygeek
Copy link
Owner

@mostlygeek mostlygeek commented Oct 13, 2025

Fixes #348

Summary by CodeRabbit

  • Bug Fixes

    • Prevents premature unloading while requests are in flight, reducing interrupted responses.
    • Improves start/stop reliability under load by tightening synchronization and state handling.
    • Ensures graceful shutdown behavior and safer command stop sequencing.
    • More consistent tracking of last handled request time for TTL-based unloading.
  • Refactor

    • Centralizes state transitions and guarded access to internal command controls for stability.
  • Tests

    • Stabilizes proxy manager test by eliminating concurrent writes, ensuring header assertions occur after handler completion.
  • Chores

    • Adds race detection to the full test suite for improved concurrency safety.

- TestProxy_UnloadAfterTTL
- TestProcess_HTTPRequestsHaveTimeToFinish
- TestProcess_ShutdownInterruptsHealthCheck
- TestProcess_ExitInterruptsHealthCheck
- TestProcess_EnvironmentSetCorrectly
…ngHeader

This race condition only appeared in the test and not part of the
production code. The test was running the HTTP handler in a goroutine
and then immediately reading from httptest.ResponseRecorder while the
handler was still potentially writing to it. The fix was to remove the
time.Sleep and use channels to signal progress
@coderabbitai
Copy link

coderabbitai bot commented Oct 13, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds concurrency safety to process lifecycle and request tracking: introduces mutexes, atomic in-flight counters, and accessor methods; guards command cancellation and wait channels; updates TTL unload, start/stop/wait flows, and ProxyRequest to use new synchronization. Also updates a test to avoid concurrent recorder access and enables race detector in Makefile.

Changes

Cohort / File(s) Summary of changes
Process concurrency and state management
proxy/process.go
Added cmdMutex and lastRequestHandledMutex; introduced lastRequestHandled and inFlightRequestsCount with thread-safe accessors; added forceState and atomic waitStarting increments in swapState; guarded cancelUpstream and cmdWaitChan access; updated start/stop/wait flows and TTL unload logic to respect in-flight requests and use guarded state transitions; adjusted ProxyRequest to increment/decrement in-flight count and update lastRequestHandled.
Proxy manager test synchronization
proxy/proxymanager_test.go
Replaced context.WithCancel with context.WithTimeout(100ms) and run handler in a goroutine with a done channel; wait for completion/timeout before reading the response recorder to avoid concurrent writes; assertions unchanged.
Test runner configuration
Makefile
Updated test-all target to run go test -race -count=1 ./proxy/..., enabling the race detector for the proxy tests.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Proxy as ProxyRequest
  participant Proc as Process
  participant TTL as TTL Unloader

  Note over Proxy,Proc: Request tracking and TTL check use new sync primitives

  Client->>Proxy: HTTP request
  Proxy->>Proc: inFlightRequestsCount.Add(1)
  Proxy->>Proc: Serve request
  Proc-->>Proxy: Response
  Proxy->>Proc: setLastRequestHandled(now)
  Proxy->>Proc: inFlightRequestsCount.Add(-1)

  TTL->>Proc: Periodic TTL check
  alt inFlightRequestsCount > 0
    TTL-->>TTL: Skip unload
  else TTL expired and no inflight
    TTL->>Proc: Initiate unload sequence
  end
Loading
sequenceDiagram
  autonumber
  participant Controller
  participant Proc as Process
  participant Up as UpstreamCmd

  Note over Proc: State transitions and cmd lifecycle guarded by mutexes

  Controller->>Proc: swapState(StateStarting)
  Proc->>Proc: atomic waitStarting increment
  Proc->>Up: start command (access under cmdMutex)
  alt start succeeds
    Proc->>Proc: forceState(StateRunning)
  else start fails
    Proc->>Proc: forceState(StateStopped)
  end

  Controller->>Proc: stopCommand()
  Proc->>Proc: read/cancel upstream and close cmdWaitChan (under cmdMutex)
  Proc->>Up: cancel if non-nil
  Proc->>Proc: forceState(StateStopped/StateShutdown)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

enhancement

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “Fix race conditions in proxy.Process” succinctly and accurately captures the primary change of adding synchronization and atomic operations in proxy.Process to eliminate data races, making it clear to anyone reading the history what the PR addresses.
Linked Issues Check ✅ Passed The PR implements synchronization primitives, atomic counters, and safe accessors in proxy.Process and updates the test in proxymanager_test.go to eliminate concurrent writes, directly addressing the root causes of data races identified in issue #348 across the listed tests.
Out of Scope Changes Check ✅ Passed All modifications are focused on fixing data races in proxy.Process and related tests as specified in issue #348, with no unrelated changes introduced.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a3502b and c61ebbc.

📒 Files selected for processing (1)
  • Makefile (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
proxy/process.go (2)

314-337: TTL unload loop has two issues: potential leak and premature exit; also zero-time triggers immediate unload

  • time.Tick leaks (SA1015). Use time.NewTicker.
  • The loop returns if state != Ready; since the goroutine starts before Ready, it can exit before TTL ever runs.
  • lastRequestHandled starts at zero; time.Since(zero) is large, so it may unload immediately on first tick if no requests have completed.

Fix by initializing lastRequestHandled when enabling TTL, waiting until Ready, and using a ticker.

As per coding guidelines (staticcheck)

-        go func() {
-            maxDuration := time.Duration(p.config.UnloadAfter) * time.Second
-
-            for range time.Tick(time.Second) {
-                if p.CurrentState() != StateReady {
-                    return
-                }
-
-                // skip the TTL check if there are inflight requests
-                if p.inFlightRequestsCount.Load() != 0 {
-                    continue
-                }
-
-                if time.Since(p.getLastRequestHandled()) > maxDuration {
-                    p.proxyLogger.Infof("<%s> Unloading model, TTL of %ds reached", p.ID, p.config.UnloadAfter)
-                    p.Stop()
-                    return
-                }
-            }
-        }()
+        // Initialize to now so we don't unload immediately before first request completes.
+        p.setLastRequestHandled(time.Now())
+        go func() {
+            maxDuration := time.Duration(p.config.UnloadAfter) * time.Second
+            ticker := time.NewTicker(time.Second)
+            defer ticker.Stop()
+            for range ticker.C {
+                st := p.CurrentState()
+                // Exit when process stops/shuts down; otherwise wait until Ready
+                if st == StateStopped || st == StateShutdown {
+                    return
+                }
+                if st != StateReady {
+                    continue
+                }
+                // Skip while requests are in-flight
+                if p.inFlightRequestsCount.Load() != 0 {
+                    continue
+                }
+                if !p.getLastRequestHandled().IsZero() && time.Since(p.getLastRequestHandled()) > maxDuration {
+                    p.proxyLogger.Infof("<%s> Unloading model, TTL of %ds reached", p.ID, p.config.UnloadAfter)
+                    p.Stop()
+                    return
+                }
+            }
+        }()

347-357: Stop() waits before transitioning to Stopping → can hang under load

Stop() waits for in-flight requests while state remains Ready, allowing new requests to arrive and preventing the WaitGroup from draining. Transition to Stopping first to reject new requests, then wait and stop.

 func (p *Process) Stop() {
-    if !isValidTransition(p.CurrentState(), StateStopping) {
-        return
-    }
-
-    // wait for any inflight requests before proceeding
-    p.proxyLogger.Debugf("<%s> Stop(): Waiting for inflight requests to complete", p.ID)
-    p.inFlightRequests.Wait()
-    p.StopImmediately()
+    // Transition first to reject new requests, then wait and stop.
+    if curState, err := p.swapState(StateReady, StateStopping); err != nil {
+        p.proxyLogger.Infof("<%s> Stop(): Ready -> Stopping failed: %v (current: %v)", p.ID, err, curState)
+        return
+    }
+    p.proxyLogger.Debugf("<%s> Stop(): Waiting for inflight requests to complete", p.ID)
+    p.inFlightRequests.Wait()
+    p.stopCommand()
 }
🧹 Nitpick comments (2)
proxy/proxymanager_test.go (1)

1078-1099: Avoid unnecessary fixed 100ms wait; select on handler completion first

Current flow always waits for ctx timeout even if handler returns early. Use a select to prefer done, falling back to ctx.Done(), then ensure the goroutine has exited before asserting.

-            // Wait for either the handler to complete or context to timeout
-            <-ctx.Done()
-
-            // At this point, the handler has either finished or been cancelled
-            // Wait for the goroutine to fully exit before reading
-            <-done
+            // Prefer handler completion; otherwise wait for context timeout, then ensure exit
+            select {
+            case <-done:
+            case <-ctx.Done():
+                <-done
+            }
proxy/process.go (1)

379-387: Shutdown should mark Stopping to block new requests before killing upstream

Currently Shutdown does not transition to Stopping, so ProxyRequest can still accept new requests until the process actually exits. Mark Stopping first (for Ready/Starting), then stop and force Shutdown.

 func (p *Process) Shutdown() {
-    if !isValidTransition(p.CurrentState(), StateStopping) {
-        return
-    }
-
-    p.stopCommand()
-    // just force it to this state since there is no recovery from shutdown
-    p.forceState(StateShutdown)
+    cur := p.CurrentState()
+    switch cur {
+    case StateReady, StateStarting:
+        if _, err := p.swapState(cur, StateStopping); err != nil {
+            p.proxyLogger.Infof("<%s> Shutdown(): %s -> Stopping failed: %v", p.ID, cur, err)
+        }
+    default:
+        if !isValidTransition(cur, StateStopping) {
+            return
+        }
+    }
+    p.stopCommand()
+    p.forceState(StateShutdown) // terminal state
 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5392783 and 2a3502b.

📒 Files selected for processing (2)
  • proxy/process.go (14 hunks)
  • proxy/proxymanager_test.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

Fix all staticcheck-reported issues in Go code

Files:

  • proxy/proxymanager_test.go
  • proxy/process.go

@mostlygeek mostlygeek merged commit caf9e98 into main Oct 13, 2025
2 of 3 checks passed
@mostlygeek mostlygeek deleted the race-test branch October 13, 2025 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix data races

2 participants